%md
[**Scaling Deep Learning Best Practices**](https://databricks-prod-cloudfront.cloud.databricks.com/public/793177bc53e528530b06c78a4fa0e086/0/14560762/100107/latest.html)
* Use a GPU
* Early Stopping
* Larger batch size + learning rate
* Use Petastorm
* Use Multiple GPUs with Horovod
Scaling Deep Learning Best Practices
- Use a GPU
- Early Stopping
- Larger batch size + learning rate
- Use Petastorm
- Use Multiple GPUs with Horovod
Last refresh: Never
%md
[**ULMFiT - Language Model Fine-tuning**](https://arxiv.org/pdf/1801.06146.pdf)
* Discriminative Fine Tuning: Different LR per layer
* Slanted triangular learning rates: Linearly increase learning rate, followed by linear decrease in learning rate
* Gradual Unfreezing: Unfreeze last layer and train for one epoch and keep unfreezing layers until all layers trained/terminal layer
ULMFiT - Language Model Fine-tuning
- Discriminative Fine Tuning: Different LR per layer
- Slanted triangular learning rates: Linearly increase learning rate, followed by linear decrease in learning rate
- Gradual Unfreezing: Unfreeze last layer and train for one epoch and keep unfreezing layers until all layers trained/terminal layer
Last refresh: Never
%md
[**Bag of Tricks for CNN**](https://arxiv.org/pdf/1812.01187.pdf)
* Use Xavier Initalization
* Learning rate warmup (start with low LR and change to a higher LR)
* Increase learning rate for larger batch size
* No regularization on bias/no weight decay for bias terms
* Knowledge Distillation: Use a more complex model to train a smaller model by adjusting the loss to include difference in softmax values between the more accurate and smaller model
* Label Smoothing: Adjust labels so that softmax output will have probability 1 - ε for the correct class and ε/(K − 1) for the incorrect class, K is the number of labels
* Image Augmentation:
* Random crops of rectangular areas in image
* Random flips
* Adjust hue, saturation, brightness
* Add PCA noise with a coefficient sampled from a normal distribution
- Use Xavier Initalization
- Learning rate warmup (start with low LR and change to a higher LR)
- Increase learning rate for larger batch size
- No regularization on bias/no weight decay for bias terms
- Knowledge Distillation: Use a more complex model to train a smaller model by adjusting the loss to include difference in softmax values between the more accurate and smaller model
- Label Smoothing: Adjust labels so that softmax output will have probability 1 - ε for the correct class and ε/(K − 1) for the incorrect class, K is the number of labels
- Image Augmentation:
- Random crops of rectangular areas in image
- Random flips
- Adjust hue, saturation, brightness
- Add PCA noise with a coefficient sampled from a normal distribution
Last refresh: Never
%md
[**fast.ai best practices**](https://forums.fast.ai/t/30-best-practices/12344)
* Do as much of your work as you can on a small sample of the data
* Batch normalization works best when done after ReLU
* Data Augmentation: Use the right kind of augmentation (e.g. don't flip a cat upside down, but satellite image OK)
- Do as much of your work as you can on a small sample of the data
- Batch normalization works best when done after ReLU
- Data Augmentation: Use the right kind of augmentation (e.g. don't flip a cat upside down, but satellite image OK)
Last refresh: Never
%md-sandbox
© 2020 Databricks, Inc. All rights reserved.<br/>
Apache, Apache Spark, Spark and the Spark logo are trademarks of the <a href="http://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/>
<a href="https://databricks.com/privacy-policy">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use">Terms of Use</a> | <a href="http://help.databricks.com/">Support</a>
Last refresh: Never
Last refresh: Never